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We propose a method to merge several paleoclimate time series 
into one that exhibits a consensus on the features of the individual 
times series. The paleoclimate time series can be noisy, nonuniformly 
sampled and the dates at which the paleoclimate is reconstructed can 
have errors. Bayesian inference is used to model the various sources of 
uncertainty and smoothing of the posterior distribution of the consen- 
sus is used to capture its credible features in different time scales. The 
technique is demonstrated by analyzing a collection of six Holocene 
temperature reconstructions from Finnish Lapland based on various 
biological proxies. Although the paper focuses on paleoclimate time 
series, the proposed method can be applied in other contexts where 
one seeks to infer features that are jointly supported by an ensemble 
of irregularly sampled noisy time series. 

1. Introduction. Paleoclimatological proxy data, such as pollen, tree 
rings or ice cores, considered to be sensitive to past surface temperature 
variations can provide a continuous and long record of climatic changes 
where long-term instrumental data are lacking [Jansen et al. (2007)]. Pa- 
leoclimatological data are essential to place limited instrumental records in 
perspective and to assess the importance of forcing factors. However, it is im- 
portant to realize that proxy records are indirect measures of climate change 
that often reflect changes in multiple aspects of climate [e.g., Legrande et al. 
(2006); Tingley et al. (2012)]. Each proxy inevitably has its advantages and 
limitations, and different proxies may yield information on different aspects 
of climate. For example, they may be sensitive to different seasonal signals, 
have different response times, and respond directly or indirectly to climate. 
It is therefore not surprising that, for example, temperature reconstructions 
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Fig. 1. The six Holocene mean air July temperature reconstructions for Northern 
Fennoscandia used in the consensus analysis. The vertical axes show temperature in centi- 
grade (°C) and the horizontal axes are calibrated years before present. 



based on different proxies can produce somewhat different results, despite 
the fact that they reflect a common underlying truth. One would therefore 
like to have a method that could capture, in a principled manner, those 
aspects of different reconstructions that find strongest support among most 
of them, that is, establish a "consensus" on the underlying features of the 
reconstructions . 

To demonstrate the method suggested in this paper, we will find a consen- 
sus among the six Holocene, that is, post Ice Age mean air July temperature 
reconstructions shown in Figure 1. The reconstructions are based on three 
biological proxies analyzed from two lakes in Finnish Lapland and, as one 
can see, they differ from one another considerably, both in the overall tem- 
perature levels and in the details. The data behind the reconstructions and 
the consensus features the proposed method finds will be discussed in detail 
in Section 3, but let us first consider here some ad hoc methods that are 
often used to combine information across these types of paleoclimate time 
series. Such straightforward analyses are demonstrated in Figure 2. In the 
upper panel the reconstructions have been centered and then stacked into a 
single plot. A smooth has also been computed and it can be interpreted to 
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Fig. 2. Simple methods to establish a consensus between temperature reconstructions. 
Upper panel: all six reconstructions of Figure 1 centered and stacked together (blue) and 
a local linear regression smooth (red). Lower panel: averages of cubic spline interpolants 
(blue) and local linear regression smooths (green) of the centered reconstructions. Local 
linear regression smooths employ a Gaussian kernel and bandwidths computed using a 
method from Ruppert, Sheather and Wand (1995). 



represent the consensus temperature anomaly, that is, deviation from mean. 
In the lower panel the centered reconstructions have been averaged after 
first interpolating them with cubic splines or, alternatively, by smoothing 
them with local linear regression. While simple plots like these may reveal 
some features of the consensus anomaly, they clearly leave many questions 
unanswered. Individual time series are noisy, as both the reconstructed tem- 
peratures and the dates they are thought to correspond to contain errors. 
Such simple methods also tell us nothing about the uncertainty in the sug- 
gested consensus features that the presence of noise inevitably introduces. 
Further, the underlying signal may exhibit interesting features in many dif- 
ferent time scales and a single smooth or mean probably cannot capture all 
of them well. 

In climate science, a popular approach to reconstruct large-scale past cli- 
mate variation is to combine a number of individual proxy records using the 
so-called Composite Plus Scaling (CPS) method [e.g., Jones et al. (2009) 
and the references therein)]. In this method, a collection of proxy records is 
standardized and averaged after which the average is recalibrated against an 
available instrumental record of a particular environmental variable, such as 
temperature. In the calibration process, various regression techniques can 
be used to match an average of annually resolved proxy records with mod- 
ern instrumental data. The method proposed in this paper works differently 
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in that the individual reconstructions are not explicitly standardized or av- 
eraged and their consensus is found using an estimation process that does 
not directly rely on a modern instrumental record. Note that, contrary to 
the situation with annually resolved proxies such as tree rings, in the case 
of biological proxy records considered here only a few of the reconstructed 
temperatures would fall in a period for which instrumental measurements 
might be available, making regression based calibration unfeasible. 

Our proposal to consensus analysis is a Bayesian approach that consists 
of two steps. First, given a set of reconstructions, we find their consensus 
by viewing the reconstructions as data in a hierarchical model that takes 
into account the uncertainties involved. In the second step we use scale 
space smoothing to reveal the salient features of the consensus in different 
time scales. The proposed approach was first outlined in Korhola et al. 
(2006) and Holmstrom et al. (2008) and it can be viewed as an extension 
to multiple time series of the BSiZer methodology that has already found 
use in quantitative paleoecological analyses [Erasto and Holmstrom (2005, 
2006, 2007); Holmstrom (2010a); Weckstrom et al. (2006)]. 

It can be argued that a better way to model the propagation of errors 
into the consensus would be to work directly with Bayesian temperature 
reconstructions instead of using a Bayesian model to combine non-Bayesian 
reconstructions, as is done here. However, while Bayesian models may be 
becoming more commonplace, the vast majority of existing reconstructions 
are in fact non-Bayesian, based on various regression techniques, both para- 
metric and nonparametric. See, for example, Birks (1995) and Birks et al. 
(2010) for extensive reviews of the kind of methods typically used in con- 
nection with diatoms, pollen, chironomids and other biological proxies. The 
method proposed here is therefore immediately widely applicable as a sig- 
nificant improvement over the simplistic ad hoc summaries commonly used 
to represent a consensus of such reconstructions. 

To our knowledge, the first papers to describe a detailed Bayesian mod- 
eling approach to biological proxy based paleoclimate reconstruction are 
Vasko, Toivonen and Korhola (2000), Toivonen et al. (2001) and Korhola 
et al. (2002), who all used chironomid taxon abundances in lake sediments 
as temperature proxy. Their approach was further analyzed by Erasto and 
Holmstrom (2006) and more recently by Salonen et al. (2012). Bayesian 
reconstruction based on pollen abundances was described in Haslett et al. 
(2006). All these papers model explicitly the response of a biological proxy 
to temperature changes and reconstruct the temperature from taxon fossil 
abundance data in a single proxy record. More recently, a Bayesian hierar- 
chical model was used by Brynjarsdottir and Berliner (2011) to reconstruct 
climate for the past 400 years from several bore hole temperature profiles. 

The approach suggested in Li, Nychka and Ammann (2010) is perhaps 
closer to the one proposed here in that a number of local reconstructions 
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are combined to create a single temperature reconstruction, in their case for 
the whole northern hemisphere and the last 1000 years. As in the present 
paper, a biological proxy (pollen) enters the reconstruction process only as a 
temperature time series and not as raw taxon abundances, which would con- 
stitute the original data. In addition to pollen, tree rings and bore hole tem- 
peratures are also used in their model and external forcings are accounted 
for as well. However, no real proxy data are used and instead the proxy 
records are simulated on the basis of numerical climate model outputs. The 
reconstructions we aim to combine were obtained using taxon abundance 
data from actual sediment cores. Note that the same climate model simula- 
tion that was used in Li, Nychka and Ammann (2010) is employed also in 
the present paper but only to elicit a prior density for the consensus recon- 
struction. Other differences include the somewhat more general error models 
considered here, explicit modeling of dating uncertainty and the scale space 
approach to inference. 

In Section 2 we describe our method, assuming first fixed dates for the 
reconstructed temperatures (Section 2.1) and then allowing dating errors in 
the analysis (Section 2.2). The idea of using multi-scale smoothing to cap- 
ture temperature variation in different time scales is explained in Section 2.3. 
The analysis of the consensus features in the six Holocene temperature re- 
constructions is presented in Section 3 and Section 4 offers a discussion of 
the main points of the paper. The Matlab functions used in the main com- 
putations are provided in Erasto et al. (2011b). 

2. The method. 

2.1. Fixed dates. The method that we will describe can be used to ana- 
lyze reconstructions of any continuous variable, but as our main interest is 
in the Holocene temperature, we frame the following description in terms of 
temperature reconstructions. Thus, consider m reconstructions yi, . . . ,y m of 
past temperatures, where y^ = {yki, ■ ■ ■ ,Ukj k ] T are the estimated past tem- 
peratures from the A;th proxy series and let = [t)~i, ■ ■ ■ ,tkj k ] be the associ- 
ated radiocarbon dating based chronology. Here tfci < • • • < tkj k so that y^i 
and ykj k are the reconstructions for the oldest and the youngest dates, re- 
spectively. We assume that the reconstructions are from a relatively limited 
geographical area so that they can be thought to reflect common underly- 
ing temperature variation and it is this common variation that we seek to 
capture. 

In the example we will consider the reconstructions are based on fos- 
sil records in sediment cores obtained from subarctic lakes. Even when the 
cores come from a limited area, due to, for example, different lake altitudes, 
the overall temperature levels and therefore the mean temperatures in the 
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reconstructions can vary considerably. We therefore consider only temper- 
ature anomalies, centering each reconstruction by subtracting its mean 
(1/jfc) ^2j k yki from all components i/ki- These centered time series represent 
reconstructions of past temperature anomalies (variation about the mean) 
and we attempt to capture the statistically significant (or "credible") fea- 
tures in what can be interpreted as the consensus of these anomalies in the 
general area where the core lakes are located. The features in the consensus 
that we are interested in are locations of maxima, minima and trends, all 
of which are not affected by centering. To avoid the introduction of new 
notation, we denote the centered reconstructions still by y&. 

The consensus anomaly is modeled as a curve /x(t), where t E [a, b] is 
a time interval that includes all chronologies from all proxy records. We 
actually assume that /i can be described by a natural cubic spline with 
knots at the points tkj l ■ Such a spline is uniquely determined by its values 
at the knots because they determine the interpolating spline uniquely [Green 
and Silverman (1994)]. The fact that this spline space is finite dimensional 
greatly simplifies our analysis. 



be the set of distinct dates, in increasing order, in all chronologies t^. Since 

all tkis need not be different, we have that n < j± -\ 1- j m . The anomaly 

curve is modeled as a natural cubic spline with values (ii = n(ti) at the knots 
t,i. Thus, instead of we can from now on work with the finite dimensional 
vector /*=[//!,..., fi n ] T of past anomalies at times ij. 

Now, let /ifc be the part of /x that corresponds to the chronology of the 
kth. reconstruction y^. We assume that 



where Sk has the multivariate normal distribution N(0, Sfc) with an unknown 
covariance matrix S^. Our model therefore allows time- varying, correlated 
reconstruction errors that can also have different magnitudes for different 
proxies and cores. Such a model is supported by the exploratory analysis 
reported in Erasto et al. (2011a). We further assume that the anomalies are 
conditionally independent given the parameters /x and {S^.} = {Si, . . . , S m } 
so that the likelihood of the data y = [yf , . . . , ym] T , given these parameters, 
is 



Let 



m 



(1) 




k 



(2) 



Yk — f^k + £ k 



in 



(3) p(y| M ,{S fc })ocIJ|S 



A; 



1/2 



exp -^(yfe-Mfe) S fc (y/c-Mfc) . 



k=l 
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As a prior distribution for H k we use an Inverse Wishart distribution, 

(4) P CZ k \W k ,v k ) oc |S fe |-^+^+ 1 )/ 2 exp[-itr(W fc S- 1 )], 

a standard choice in connection with a multivariate normal likelihood. As 
there seldom is any prior knowledge of a particular error correlation struc- 
ture, we typically use a diagonal prior scale matrix and select the degrees 
of freedom v k so that the prior (4) is rather vague, allowing nondiagonal pos- 
terior covariances. The relative magnitudes of the diagonal elements of 
could also be used to model the increased level of difficulty of temperature 
reconstruction for the older sediment layers [Erasto and Holmstrom (2007)] . 
The Sfc's are assumed to be independent a priori so that 

m 

(5) P({£fc}) =p({Sfc}|{W fc ,i/ fc }) = n^SfelWfc,^). 

fc=i 

We have also experimented with a more complex model that allows re- 
construction error correlations between different proxy records. Let again 
y = [yj, . . . ,y^] T be the vector of length j\ + • • • + j m that contains all 
reconstructions. The more complex model considered assumes that 

(6) p(y\fi, E) a isr^expf-^y - G/xfS^y - Gfi)], 

where G/x is a modification of the consensus fj, where some components 
Hi appear several times to account for the fact they correspond to dates 
in the joint chronology that appear in more than one reconstruction. The 
covariance matrix E again has an inverse- Wishart prior 

(7) p(E|W» oc |S|- ( ^ +J ' +1)/2 exp[-itr(WS- 1 )], 

where now j = ji + ■ ■ - + j m an d W is the diagonal matrix whose diagonal 
elements are those of the matrices Wi,...,W m . The results reported in 
the paper all pertain to the model (3) and the more complex model (6) is 
discussed in Erasto et al. (2011a). 

For the consensus anomaly fi we use a smoothing prior that penalizes for 
roughness as measured by the variability of its components, 

(8) p(HAo,t) oc A^- 2) / 2 exp(-^/x r K^ . 

In this formula, K is a symmetric positive semidefinite matrix such that 

(9) /, T K/x= l\ii"(t)] 2 dt 

J a 

and Ao > 0. Thus, the roughness in the prior (8) is measured by the second 
derivative of the natural cubic spline that interpolates the values fi at the 
knots ti and the level of roughness penalty is controlled by Aq [Green and 
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Silverman (1994)]. The power (n — 2)/2 in the scaling factor reflects the 
rank of the matrix K which is n — 2. Note that the smoothing prior (8) 
imposes dependence between the temperature anomalies fi k derived from 
these proxies. This is natural because the reconstructions are assumed to 
reflect common underlying temperature variation. 

The parameter Ao describes our prior beliefs about the smoothnesss of \i. 
We consider it unknown with prior uncertainty described by a Gamma dis- 
tribution. In principle, point estimation such as cross-validation can be used 
to choose suitable values for the prior distribution parameters [Erasto and 
Holmstrom (2005)], but we prefer here a choice that produces a poste- 
rior mean of fx of reasonable roughness. The important thing is to avoid 
choosing Ao too large because then the finest details of fi might be lost 
[Erasto and Holmstrom (2005, 2007)]. 

The joint posterior distribution of all the unknown parameters in the 
model is now obtained from the Bayes' formula, 



where all the distributions on the right-hand side were defined above. Gibbs 
sampling can be used to generate a sample from this posterior distribution. 
An estimate of the consensus anomaly that is consistent with the data and 
our prior beliefs, together with its uncertainty, is described by the marginal 
posterior distribution p{^\y), which then can be approximated by the /x- 
component of this sample. The model (6) is handled similarly. 

2.2. Random dates. In the previous section we assumed that the recon- 
structed temperature anomalies yti could be associated precisely with the 
dates tki- In reality, however, the core chronologies are derived from radio- 
carbon dating based estimates, a process that is not error-free. Taking into 
account this source of uncertainty can be important when one tries to make 
inferences about the common features in several temperature time series 
with different associated chronologies. 

Let tfc = [tki, ■ ■ ■ ,tkj k ] again be the radiocarbon dating based chronology 
for the kth. reconstruction. Allowing for the fact that the dates tki have er- 
rors, we assume that they and the dates Tki in the true, unobserved chronol- 
ogy, satisfy tki = Tki + Ski, where 5ki represents an error. Denote the true 
chronology for the kth reconstruction by = [Tki, ■ ■ ■ ,T~kj k ]- We assume 
that both sequences and are strictly increasing. Note that, for k ^ k', 
Tk and Ty may well contain some dates that are known to be the same. 
This is the case, for example, when k and k' correspond to two different 
proxies analyzed from the same core and using the same sediment samples 
for both. Let 



(10) 



p(/x,{£ fe },A |y,t) ocp(X )p({S k })p(fj,\X ,t)p(y\fji,{Sk}) 



in 
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be the set of distinct dates in all chronologies Tk, k = 1, . . . ,m [cf. (1)]. As 
with the dates t k i in the previous section, since all t^'s need not be different, 
we have in general that n < j\ + • • • + j m . The observed dates t k i for equal 
Tfc;'s are assumed to be also equal and we denote by t = {t\, . . . ,t n } the set 
of tfcz's corresponding to r. Our model for these distinct dates now is 

(12) t i = T i + 5 i , 

i = 1, . .. , n, and we assume that, given the parameters Tj, the <Vs are in- 
dependent zero mean normal variables with known variances ipf > 0. The 
variances that we will use are based on the standard errors associated with 
the chronologies (cf. Section 3.2). The likelihood of the observed dates t 
from (12) is 

n 

(13) p(t\r) = p(t\r, {tf}) <x n^" 1 exp 

i=i 

where {ipf} = {tpf, ■ ■ ■ ,ipn}- We set a prior distribution on the tj's that en- 
forces the correct temporal order of the chronology within each reconstruc- 
tion, 

m 

(14) p(r) OC JJl(Tfci <T k2 < ••• <T kjk ). 

k=l 

Let now rm < • • • < rr n \ be a permutation of r into an ascending order. 
The consensus anomaly is then modeled as natural cubic spline fi(r) with 
knots at the points 7?j) , uniquely determined by the vector fi = \p,i, . . . , /i„] T ', 
fii = fj,(r({\). The subsequent model details are exactly the same as in the 
previous section with the exception that in the prior (8) of /x, the matrix K 
now depends on r. The joint posterior (10) becomes 

P(A») { s fc} 5 ^o, r|y, t) oc p{Xo)p{{^ k })p(T)p(fi\X , r) 

(15) 

xp(y|A»,{s fc })p(t|T). 

A hybrid algorithm that uses Gibbs and Metropolis-Hastings Monte Carlo 
sampling can be used to generate a sample from this posterior distribution 
[e.g., Robert and Casella (2005)]. The proposal density for is N(0, lO -2 ^?). 
Again, the model (6) can be handled similarly. For easy reference, Table 1 
summarizes the quantities defined in this and the previous section. 

2.3. Scale space feature analysis. The two previous sections showed how 
to estimate the consensus of several temperature reconstructions. This sec- 
tion explains how to find its credible features in different time scales. The 
key idea is that of a scale space. This concept has its roots in computer 
vision, but it has recently inspired a host of new statistical data analysis 
tools based on multi-scale smoothing. For an overview of these methods we 
refer to Holmstrom (2010b). 
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Table 1 

Glossary of symbols used, their associated likelihoods or priors and the full conditional posteriors of 
the estimated parameters. The multivariate normal distribution N(/x ,So) in the conditional posterior of 
(J, is obtained as the product of (3) and (8) and it is discussed in Appendix B. In the conditional posterior of 
r we denote 9 = diag(^i, . . . ,ip„) [cf. (13)] and the proposal density for n is N(0, W~ 2 ipi) 
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In the context of this article, the scale space approach amounts to using 
smoothing to make inferences about the credible, or "statistically signifi- 
cant," features of the consensus anomaly \i underlying the data. Thus, sup- 
pose that 5a is a smoothing operator associated with a smoothing level A > 
and let [i\ = S\fi be the corresponding smooth of \x. In the classical scale 
space literature [e.g., Lindeberg (1994)], the smoother S\ would typically be 
a Gaussian convolution (moving average with Gaussian weights) with con- 
volution kernel width (the averaging window) determined by A. However, in 
the statistical literature other smoothers are often used. 

The idea is to make inferences about the features of \x\ for a range of 
smoothing levels A. Each [i\ is interpreted to reveal features of /i at a certain 
time scale, little smoothing (small A) showing the short time scale variation 
and heavy smoothing (large A) revealing the coarsest features, such as the 
overall trend. We are, in particular, interested in the maxima and minima of 
H\ and therefore base our inferences on the derivative because its sign tells 
where the local trend is positive or negative. For Bayesian reasoning we need 
the posterior |y,t). However, as the spline [i is uniquely represented by 
the vector \i of its values at the knots, we may instead consider a smoothing 
matrix Sa, the smooth /x A = S\fi, and then use another matrix D [e.g., 
Green and Silverman (1994)] to evaluate the derivative fj>' x at some fixed 
dense set of time points Si < • • • < s r , 

(16) Dji A = [^( Sl ),...,/4(a r )] T . 

The smoothing matrix used in our scale space feature analysis is defined as 
Sa = (1 + AK)" 1 and it actually smooths a discrete set of points /i by fitting 
a smoothing spline [Green and Silverman (1994)]. Instead of p(fi' x \y,t), one 
can now analyze the posterior distribution p(DSA/x|y,t). For fixed dates, 
a large sample can first be generated from p(/i|y,t) and then transformed 
by multiplying the sample vectors by the matrix DSa- Inference about the 
features of /i at the time scale A is then based on this sample. With random 
dates, the scale space analysis needs samples from both /i. and t, as the 
smoothing matrix Sa depends on r through K. 

Note here the difference between the parameter Ao used in constructing 
the consensus and the parameter A in scale space feature analysis: Ao de- 
scribes our prior beliefs about the underlying consensus /i, whereas different 
values of A are used to explore the features of /i in different time scales. The 
choice of prior distribution for Ao is discussed in Section 3.3.2. We also em- 
phasize that all inferences on the features of \x are made in a simultaneous 
fashion, over all time points Sj in (16). Therefore, instead of just examin- 
ing the statistical significance of individual slopes /J-(sj), the credibility of 
whole patterns of trends are established. For more details on the inference 
procedures used we refer to Erasto and Holmstrom (2005). 
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3. Holocene temperature variation in Finnish Lapland. 

3.1. The data used. We demonstrate the proposed method by finding 
the consensus among six temperature reconstructions based on high resolu- 
tion lake sedimentary data (50-70 year intervals) of three biological proxies 
from two sites (Figure 1). The two lakes, Toskal and Tsuolbmajavri, se- 
lected for analysis are located at a climatically sensitive tree-line region 
of Finnish Lapland. They both contain fossil records of three fundamental 
climate proxies, pollen, chironomids (nonbiting midges) and diatoms (uni- 
cellular micro-algae) from the same sediment cores. The sediments of such 
remote lakes at high altitudes and latitudes are perhaps one of the few sys- 
tems where a continuous, high resolution record of terrestrial environmental 
variability, uninfluenced by human impact throughout the post-glacial, can 
be found. 

Past temperatures were reconstructed using regional training sets of lakes 
for pollen, chironomids and diatoms (304, 62 and 64 lakes, resp.) and a 
regression based reconstruction technique referred to as weighted averaging 
partial least squares (WA-PLS) [ter Braak and Juggins (1993)]. The model 
relates the modern mean July temperatures at the training lakes to the 
abundances of various proxy taxa preserved in the top (0-1 cm) surface 
sediments that represent the last few years of sediment accumulation. The 
past air temperatures are reconstructed by substituting in the regression 
model the taxon abundances found in the sediment cores from the two lakes 
selected for analysis. This approach is based on the assumption that each 
taxon has a certain optimal temperature at which it fares particularly well 
and that, therefore, the relative abundances of taxon fossils in a sediment 
layer reflect the temperature at the time the sediment layer was formed. 
For more details regarding the training sets and reconstruction models, see 
Seppa and Birks (2001), Seppa et al. (2002) and Weckstrom et al. (2006). 

The sediment records are supported by chronologies based on multiple 
AMS 14C determinations [Seppa and Birks (2001); Seppa et al. (2002)]. As 
the chronology inevitably contains errors, an attempt is made to take this 
uncertainty into account by using the model described in Section 2.2. Table 
S.2 in Erasto et al. (2011a) gives all the data used in our consensus analysis: 
the sediment depths, calibrated ages and their standard errors as provided by 
the dating laboratory, as well as pollen-, chironomid- and diatom-based July 
mean temperature reconstructions for the lakes Toskal and Tsuolbmajavri. 

3.2. Chronology errors, prebinning. The combined chronology (1) in- 
cludes several pairs of dates with only a few years apart. The spline in- 
terpolant used in representing the consensus temperature anomaly as a con- 
tinuous function n(t) can exhibit unnatural wiggles between such nearby 
dates and we therefore aggregated the dates into 15 year wide bins. The 
chronology standard errors of aggregated dates could then be averaged, but 
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Fig. 3. Standard errors of the combined binned chronology of the two sediment cores 
(blue). Average standard error is plotted when two or more dates coincide after binning. 
Also shown is a local linear smooth that was used in defining the parameters tpi of the 
dating model likelihood (13). 

we actually decided to smooth all of them as shown in Figure 3 and com- 
puted the parameters ipi in (13) from the values of this smooth. It retains 
the most important feature of the dating errors, namely, that they increase 
considerably when older sediment layers are considered. These approxima- 
tions seem reasonable given the large standard errors associated with the 
dates and the rather simplistic dating error model (12) used. 

3.3. Priors for reconstruction errors and roughness. 

3.3.1. Reconstruction error. The prior distribution (4) of H k has the 
mean E(Sfc) = (y k — j k — l) _1 Wfc, where j k is the dimension of the kth 
reconstruction y^. We use a diagonal scale matrix = w k Ij k such that 
E(Sfc) = o\lj k , where a\ is an estimate for the upper bound of reconstruction 
error variance. Appendix A suggests a method to derive such upper bound 
estimates and the values obtained are given in Table 2. Since now o^Xj k = 
{yk ~ jk ~ ^)~ l w k lj k , we must have w k ja\ = v k - j k + 1. We set w k = 0.5 
for all k which corresponds to degrees of freedom v k between 77.9 and 163.1 
and makes the priors rather vague. 

The posterior values of the diagonal elements of the matrices Yl k turned 
out to be significantly smaller than their prior values. As this may suggest 
that the values a k are too large (and thus truly only upper bounds), we also 
included in our analyses a second set of error covariance priors by using the 
value a k = 0.2 for all reconstructions. In this case we opted for a tighter prior 
by taking w k = 50 which corresponds to between 1319 and 1410 degrees of 
freedom in the priors. 



14 



ERASTO, HOLMSTROM, KORHOLA AND WECKSTROM 



Assuming smaller errors naturally leads to more features in the consensus 
analysis being nagged as credible. However, the independent evidence for 
some of these features discussed in Section 3.5 can be interpreted as lending 
some credence to these smaller reconstruction errors. Trying out different 
error sizes makes sense also because it probably is not possible to estimate 
them very reliably in the first place. Exploring temperature features for 
different error levels could also be thought as a form of scale space analysis 
where increasing error levels corresponds to more smoothing. In the following 
we refer to these two prior settings as "large" and "small" errors. 

3.3.2. Roughness. The parameter Ao in (8) is used to describe our prior 
belief about the variability or "roughness" of the time series of past temper- 
atures. In choosing a prior for Ao, very long instrumental records going back 
hundreds of years might be useful. However, the longest records in Finland 
span only about 150 years, a period that includes only 2-4 chronology dates 
for the six reconstructions considered, thus making roughness estimation im- 
possible. We therefore decided to use a numerical climate model simulation 
in setting the prior roughness level. 

A 1150 year long annual mean July temperature series for Northern Fin- 
land, extending from AD 850 to 1999, was extracted from the NCAR Climate 
System Model simulation described in Ammann et al. (2007). The time series 
is shown in Figure 4 (blue curve) . The six reconstructions should actually be 
thought of as 30-year averages of mean July temperatures, sampled at dates 
included in their associated chronologies. For visual comparison between the 
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Fig. 4. Simulated mean July temperature anomaly for Northern Finland between AD 850 
and 1999 (blue curve) together with the 30-year running mean (red curve). The vertical 
axis is the temperature anomaly in centigrade (°C) and the horizontal axis is the calendar 
year. 
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Fig. 5. The six Holocene mean July temperature reconstructions for Northern Fennoscan- 
dia restricted to the time interval from AD 850 to 1999 (blue curves) together with the 
simulated 30-year means computed at the same time points (red curves). The light blue 
band around each reconstruction is based on error bars of size ±2rjfc, where the Ok 's are 
given in Table 2 of the Appendix. The vertical axes show temperature anomaly in centi- 
grade (°C) and the horizontal axes are time before present in years. Note the different 
temperature scales in the figures. 

simulation and the reconstructions we therefore applied a 30-year moving 
average to the simulated anomaly (red curve in Figure 4) and then sampled 
the average at the dates in the reconstruction chronologies. The results are 
shown in Figure 5. As one can see, the reconstructions are at least as rough 
as the simulation. It therefore appears that at least some prior smoothing 
indeed is required in the consensus analysis which motivates the use of a 
smoothing prior (8) for the consensus. Further, if the simulation is taken to 
represent the actual temperature variation, the reconstruction errors are not 
very large. The light blue band around each reconstruction is based on error 
bars of size ±2(7fc, where the cf^'s are given in Table 2 of the Appendix. 

To design a prior for Ao, one can use the simulated time series also for 
more formal roughness estimation. Given a time series fx, one can measure 
its roughness by the quantity R(fJ-) = /x T K/x in the exponent of (8). For the 
simulated 30- year running mean, evaluated at the joint chronology dates (1) 
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Fig. 6. Posterior distribution of the roughness measure R(fJ.) = /x T K/i for large (left 
panel) and small (right) prior errors. The histograms are based on 2000 sample values and 
the dashed line indicates the roughness of the simulation. 



contained in the interval from AD 850 to 1999, we have R(/j,) = 2.1 • 10~ 4 . 
Using the prior Gamma(20, 0.5) for Ao, the posterior mean of R(^) is 2.2 • 
10 -4 and 2.5 • 10~ 4 for the large and small prior errors, respectively. In both 
cases the mean posterior roughness of the consensus is therefore slightly 
larger than that of the simulations which, as indicated in Section 2.1, is 
desirable in order not to smooth too much before scale space analysis is 
carried out. We therefore used Gamma(20, 0.5) as the prior distribution for 
Ao- Figure 6 shows the posterior distribution of R(fi) for both large and 
small prior error settings with the roughness of the simulation depicted as a 
dashed line. By testing other reasonable alternatives we also concluded that 
neither the mean nor the width of the prior distribution of Ao has a major 
effect on the estimated consensus features. 



3.4. The consensus and its credible features. Scale space analyses of the 
consensus anomaly with large and small prior reconstruction errors are 
shown in Figures 8 and 9, respectively. The top panel shows the recon- 
structed temperature anomalies (dots) together with the posterior mean 
of the consensus (blue curve). The middle panel shows the posterior mean 
again together with three smooths E(fj,\\y,t) of the posterior consensus cor- 
responding roughly to multi-decadal (light blue), centennial (purple) and 
millennial (yellow) time scales (cf. Section 2.3). Comparing with the ad hoc 
methods discussed in the Introduction, we observe that there is a qualitative 
correspondence between the smoothing based curves of Figure 2 (red and 
green curves) and the centennial level posterior means of our scale space 
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Fig. 7. Upper panel: sample curves of fix (green) together with the posterior mean 
E(/Ltx|y>t) (blue). Lower panel: corresponding samples of fj! x and the posterior mean 
E(/i^|y,t). The color bar on the bottom depicts posterior sample based inference on the 
sign of f± x . For more information, see the text. 

analyses as well as between the mean of the spline interpolants (lower panel, 
blue curve) and our multi-decadal posterior mean. 

The bottom panel is a feature credibility map where the vertical axis 
represents the smoothing level A (in logarithmic units), that is, the time scale 
at which the features are examined. The smoothing levels corresponding to 
the three smooths of the middle panel are indicated by horizontal lines of 
the same color. A pixel at a location (sj, A) is colored blue or red depending 
on whether the slope of the smoothed anomaly fi\ is credibly negative or 
positive. Thus, blue and red indicate cooling and warming, respectively, 
at the particular time Sj and scale A considered. Flagging of negative and 
positive slopes is based on their joint posterior probability which is required 
to exceed a given threshold a, typical values used being in the range [0.8, 
0.95]. Gray color indicates that the sign of the slope is not credibly different 
from zero. 

Figure 7 is a schematic illustration of how the map is drawn, focusing 
on the interval from 2729 to 2604 years before present and a multi-decadal 
smoothing level A. In the upper panel, a few sample curves of fix (green) to- 
gether with the posterior mean K(fj,\\y,t) (blue) are shown. The lower panel 
shows the corresponding samples of //, and the posterior mean E(//|y, t). 
The color bar on the bottom depicts posterior sample based inference for 
the chosen fixed value of A, where, with posterior probability at least a, 
the derivative of fj,\ is positive or negative on the intervals indicated by 
red and blue, respectively, and the probability is computed jointly over all 
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Fig. 8. Scale space analysis of the consensus of six temperature reconstructions. The 
top panel shows the reconstructions (dots) and the posterior mean of the consensus (blue 
curve). Large reconstruction errors were assumed and the credibility level a = 0.8. The 
middle panel shows the posterior mean of the consensus together with three smooths of the 
posterior consensus corresponding roughly to multi-decadal (light blue), centennial (purple) 
and millennial (yellow) time scales. The bottom panel is the credibility map where blue and 
red indicate credible cooling and warming, respectively. For more information see the text. 



time points Sj in these intervals. The full map, such as in the middle panels 
of Figures 8 and 9, is obtained by stacking such color bars, for the whole 
Holocene and for all scales A considered. 
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Fig. 9. Scale space analysis of the consensus of six temperature reconstructions. Small 
reconstruction errors were assumed and the credibility level a = 0.8. For more information 
see the caption of Figure 8 and the text. 



As in our earlier scale space analyses of the paleoclimate, the credibility 
level was chosen as a = 0.8 [e.g., Erasto and Holmstrom (2005, 2007, 2006); 
Weckstrom et al. (2006)]. Increasing the level, say, to 0.95, slightly shrinks 
the credible features (blue and red areas) but does not affect much the 
interpretation given in Section 3.5. The a = 0.95 versions of all consensus 
credibility maps are included in the supplement [Erasto et al. (2011a)]. 
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Fig. 10. Consensus based on subgroups of the six temperature reconstructions considered. 
Large reconstruction errors are assumed and the credibility level is 0.8. In the top row, the 
Lake Toskal map is based on all three proxy records obtained from that lake and similarly 
for Lake Tsuolbmajavri. The other three maps show the consensus according to each proxy 
when the corresponding proxy records from each lake have been combined. The bottom 
panel of the second column is a more detailed analysis of how each reconstruction affects 
the overall consensus within a particular time interval on a millennial time scale. For more 
information see the caption of Figure 8 and the text. 



It is interesting to study also the effects on the consensus of the two lakes 
and the three proxies separately. Such an analysis is presented in Figure 10, 
where credibility maps for the lakes and the proxies based on large recon- 
struction errors are displayed. One can also analyze the role of each of the 
six reconstructions more quantitatively by considering their mean contribu- 
tions to the posterior consensus. Appendix B proposes such an approach 
and to demonstrate the idea, we examined more closely the early Holocene 
warming suggested in the credibility map of Figure 9. The bottom panel 
of the second column of Figure 10 shows the mean contribution of each re- 
construction to the slope of the consensus at a millennial time scale (yellow 
curve in Figure 9), from the beginning of the Holocene to 7000 years before 
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present. Such a plot can be useful when one wants to focus the analysis on 
a particular feature in a limited time window. 

The results of Figures 8-10 are based on /i-samples of size 4000 where the 
first 2000 were used for burn-in. Generating such a sample on a standard 
PC takes about 10 hours. A uniform grid of about 2000 time points Sj and 
a logarithmic grid of 200 smoothing levels A were used in the scale space 
analyses. With random dates it takes about 10 hours to process a batch of 
10 smoothing levels. Computations can be sped up by allocating the batches 
to different processors. Parameter convergence was checked visually. Initial 
values were picked from the priors for those parameters that are updated by 
Gibbs sampling and the carbon dating based values were used to initialize 
the chronologies. The posterior error covariances were almost diagonal but 
heteroskedastic with small off-diagonal elements. The chronologies changed 
only little in the simulation. The standard error of a radiocarbon date is 
commonly interpreted as a standard deviation of a normal distribution cen- 
ter at the date [cf. (13)]. To test the robustness of dating error assumptions, 
we repeated some of our analyses assuming either a much smaller (down to 
zero) or a much larger (up to several times the value used in the reported 
analyses) standard error, but the features proposed by the maps stayed the 
same. For very large standard errors this is due to proposals in the MCMC 
simulation being mostly rejected. 

3.5. Interpretation of results. 

3.5.1. Consensus features. According to the credibility maps of Figures 8 
and 9, overall cooling is the longest time scale feature of Holocene summer 
temperature in northern Finland, indicated by the continuous blue color 
in the topmost part of the maps. This is thought to be mostly due to the 
earth's changing orbital geometry around the sun. At millennial scales (yel- 
low lines in the maps), the consensus summer temperatures exhibit some 
other key aspects of Holocene climate evolution, such as an early Holocene 
warming trend shown strongly in Figure 9 and weakly in Figure 8, together 
with a peak warming at around 8 kyr BP (8000 years before present) indi- 
cated by red changing to blue, followed by a monotonic cooling trend (blue 
color) until the present time. This overall pattern is predominantly driven 
by annual mean and summer orbital forcing at the high northern latitudes 
[Berger and Loutre (1991)]. In the Northern Hemisphere summer months 
the incoming solar radiation (insolation) peaked between 11 and 9 kyr BP 
[Kutzbach (1981)], when insolation was approximately 7-9% higher than at 
present at 70° N, and gradually declined since then. The relatively cool sum- 
mer temperatures in the early Holocene (rising trend before 9 kyr BP) in the 
consensus hence refer to a slightly delayed timing of the Holocene Thermal 
Maximum (HTM) relative to this peak summer insolation, suggesting that 
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the climate response to the orbital forcing must also be affected by some ex- 
tra forcings and internal feedbacks in the climate system [Chapin III et al. 
(2000)]. The cool conditions in the earliest Holocene were apparently heav- 
ily influenced by the last substantial remnants of the large Fennoscandian 
and Laurentide continental ice sheets that trigged changes in ocean heat 
transportation and surface albedo [Kaplan and Wolfe (2006); Renssen et al. 
(2009)]. 

According to our consensus reconstruction, HTM in northern continental 
Europe occurred at around 8-9 kyr BP, when the inferred summer tem- 
perature values clearly exceeded the modern levels. This early peaking of 
Holocene warmth contradicts several earlier studies that place the timing of 
peak warming across a wide area of northern Europe closer to mid-Holocene 
at around 6 kyr BP [Davis et al. (2003); MacDonald et al. (2000); Kaufman 
et al. (2004)]. Evidence for the mid-Holocene thermal maximum in northern 
Europe comes largely from a northward and upward expansion of northern 
treelines, as well as from retreating glaciers [Jansen et al. (2007)]. However, 
a recent global assessment of treeline response to climate warming suggests 
that treeline advance may be more strongly associated with winter, rather 
than summer, warming [Harsch et al. (2009)]. In addition, in many parts 
of Scandinavia, glaciers started to retreat in the early Holocene, soon after 
the transient cooling event, termed the Finse event [8.5-8.0 kyr BP; Nesje 
et al. (2008)]. The early expression of peak summer warming identified in 
the present study is further consistent with a recent model simulation study 
[Renssen et al. (2009)], where maximum summer warmth in the northeast 
of Europe was placed closer to 8 kyr BP. 

At multi-decadal to centennial scales (light blue and purple lines in the 
maps), climate variability as highlighted in our small-error analysis (less so 
with large reconstruction errors) shows a complex picture with indications 
of repeated warm and cold climate episodes, the specific causes of which are 
not fully understood. Some of the peaks found in our record seem to be co- 
herent with the Holocene series of North Atlantic ice-rafting events defined 
by Bond et al. (1997) within the dating uncertainties (±100 to 200 years). 
These include the weak temperature minima at around 1.4, 2.8, 4.2 and 
around 10.3 kyr BP, whereas the remaining mid- and early Holocene "Bond 
events" are not evident in our record. Neither can we find any event-like 
feature around the classical 8.2 kyr BP cooling event [Alley et al. (1997)], 
although the most pronounced decline in overall Holocene summer tempera- 
tures started in our record around this time (see above). Examination of the 
maps at the smallest smoothing levels shows credible fluctuations in summer 
temperature, in particular, between 7.0 and 5.0 kyr BP and from 3.0 kyr 
BP to the present, while more stable conditions occurred between 5.0 to 3.0 
kyr BP and in the early Holocene. Solar variability is the most plausible 
explanation for the temporal dynamics of these short-term changes. Indeed, 
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recent work utilizing spectral analysis of radionuclide records suggests that 
the solar cycles were particularly prominent during the time intervals 6.0-4.5 
kyr BP and 3.0-2.0 BP, whereas this periodic behavior faded during other 
time intervals [Knudsen et al. (2009)]. Hence, the high- variability intervals 
in our record coincide with the periods of intensive solar cycles, which in 
turn correlate with periods of significant reorganization of the ocean and at- 
mospheric circulation in the North Atlantic region [Mayewski et al. (2004); 
Seidenkrantz et al. (2007)]. 

Our scale space consensus analysis (in particular, the credibility map of 
Figure 9) indicates that the Northern Fennoscandia summer climate expe- 
rienced a succession of warming and cooling events during the most recent 
part of the Holocene, broadly similar to those documented earlier in North- 
ern Hemisphere temperature reconstructions, including the Current Warm 
Period (CWP), Little Ice Age (LIA) and Medieval Climate Anomaly (MCA) 
[Jansen et al. (2007); Mann et al. (2008)]. The MCA commenced around 1.3 
kyr BP and terminated around 0.8 kyr BP when temperatures started to 
decrease toward the LIA. Conditions slightly warmer than those of the 20th 
century may have prevailed in the North Atlantic climate regime during the 
MCA as deduced on the basis of our analysis. The peak medieval warmth 
is around 1.2 kyr BP in our record, which is earlier than in many previous 
published reconstructions, but is in accordance with Mann et al. (2008) who 
place the MCA between AD 1450 and AD 700. The LIA in our consensus re- 
construction occurred perhaps between ca. 0.5 and 0.15 kyr BP (about AD 
1500-1850), in agreement with the recent Arctic- wide synthesis of proxy 
temperature records [Kaufman et al. (2009)]. The recent warming (CWP) 
shows as a credibly positive temperature trend in centennial scales. 

3.5.2. Contributions from the proxies and the lakes. Looking at the lake- 
and proxy-specific credibility maps of Figure 10, we note first that, of the 
three proxies, the pollen-based reconstructions suggest most features with 
somewhat fewer credible features exhibited by the chironomid and the di- 
atom records. All three agree on a Holocene- wide cooling trend which there- 
fore becomes part of the overall consensus. Still, on millennial scales (yellow 
line), the cooling trend after about 4 kyr BP in the chironomid record is a 
bit less certain than in the two other proxies. It is notable that evidence for 
early Holocene warming and the HTM in the overall consensus appears to 
come from the pollen record only. The millennial scale detail analysis shown 
in the bottom panel of the second column of Figure 10 clearly confirms this. 
The fact that in the large-error analysis of Figure 8 these show only weakly is 
probably due to the relatively large pollen reconstruction error upper bounds 
used for this analysis (cf. Table 2). The LIA is clearly visible as a credible 
temperature minimum only in the diatom record. However, combined with 
the cooling trend immediately prior to it, which is present also in pollen 
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and chironomid reconstructions, the LI A signal in diatoms is strong enough 
to show in the consensus, too. The Bond events (cf. Section 3.5.1) are sup- 
ported in varying degrees by different proxies. The warm MCA appears to 
be better reconstructed by chironomids than pollen. The recent centennial- 
scale rise in temperatures exhibited in the consensus is driven mostly by 
the diatom record with the chironomids showing millennial scale warming 
during the last 2000-3000 years. 

Considering the credibility maps in the first row of Figure 10, we notice 
that the records from the two lakes both support overall Holocene cooling 
and the LIA (although only barely for Toskal), whereas only Lake Toskal 
shows weak evidence for early Holocene warming. In light of the detail anal- 
ysis of Figure 10 (lower right-hand corner panel), it appears that the strong 
millennial scale warming signal in the Lake Tsuolbmajavri pollen record is 
drowned by negative contributions from the chironomid and diatom recon- 
structions. Still, as noted above, when evidence in all records is included, the 
warming signal is strong enough to show in the overall consensus. Finally, 
we observe that only the Lake Tsuolmbajavri record suggests the presence 
of the MCA and that opposite features in the lake records at around 4 kyr 
BP may be the source of centennial-scale oscillations in the consensus during 
5-3 kyr BP (purple curve in the middle panel of Figure 8). 

4. Discussion. Given a collection of noisy reconstructions, the proposed 
method uses Bayesian inference to find those features of past climate vari- 
ation that are supported by their consensus. Although only temperature 
was considered, other climate variables could be handled similarly. Further, 
while the reconstructions considered in this paper were based on radiocar- 
bon dated sediments samples, the method is conceivably applicable to other 
proxy types that use different dating methods such as tree rings, varved lake 
sediments, ice cores and speleothem archives, where estimates of dating er- 
rors are available [see Jones et al. (2009) for a discussion of these and other 
proxy types]. In case of annually resolved records such as tree rings, the fixed 
dates version of the method might suffice. Also, although the paper focuses 
on an application to paleoclimate reconstruction, the method developed is 
likely to find use also in other contexts where a combination of information 
across several noisy time series is of interest. 

Handling of dating errors in our consensus model could probably be con- 
siderably improved. A sophisticated Bayesian dating error model, BChron, 
was introduced in Haslett and Parnell (2008). Other recent proposals in- 
clude, for example, Blaauw et al. (2003), Blaauw and Christen (2005) and 
Bronk Ramsey (2008). The problem of modeling the relationship between 
sediment depth and age was also analyzed in Telford, Heegaard and Birks 
(2004) and Heegaard, Birks and Telford (2005), and aligning multiple varve 
chronologies was considered in Auestad et al. (2008). Dating error models de- 
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veloped for spatial problems could also be useful; see, for example, Fanshawe 
and Diggle (2011) and Cressie and Kornak (2003). Still, while we readily ac- 
knowledge that the error model described in Section 2.2 may be too crude 
to reflect all aspects of uncertainty in the dating process, it nevertheless can 
serve as a first approximation that allows, in principle, the effect of dating 
errors to enter the posterior uncertainty of the consensus anomaly. In future 
work we hope to incorporate in the analysis ideas from more sophisticated 
error models such as Bchron. Such an improvement in the analysis might be 
incorporated also in a system that uses Bayesian reconstructions to begin 
with. We leave these ideas for future work. 

Another direction of development would be to include the spatial depen- 
dencies between the proxy records in the model. With only two core locations 
considered in our example, this is not relevant, but it might be useful when 
more locations are included in the consensus analysis. 

We proposed to use climate simulations to gain insight into the variabil- 
ity of the past temperature. Of course, the simulation we used covers only a 
fraction of the approximately 10,000 years considered in the reconstructions 
and, therefore, in the analyses described in Section 3.3.2, one considers tem- 
perature roughness only for about 10% of the whole Holocene period. Still, 
although the mean temperature levels for the last 1150 years may be differ- 
ent from those during the rest of the Holocene, it may not be unreasonable 
to assume that the inter-annual temperature variation has not changed dra- 
matically. By studying the simulated 30-year mean for the last 1150 years 
we may therefore gain at least some idea of its roughness during the whole 
Holocene. In a sense, such an assumption could be viewed as being somewhat 
analogous to the basic premise underlying proxy-based paleoclimate recon- 
structions, namely, that the relationship between the proxy records and the 
climate has not changed over thousands of years. 

To summarize, the method described in this paper provides a means to 
estimate the consensus temperature variation in heterogenic time series and 
also to capture its salient features, such as maxima, minima and trends 
in different time scales in a statistically principled manner. Our model al- 
lows dating uncertainties, distinct or overlapping core chronologies, as well 
as time-varying, correlated reconstruction errors that can also have differ- 
ent magnitudes for different proxies and cores. We believe that the method 
has also wider applicability potential in data mining of various types of cli- 
mate records and compiled time series. When applied to lake data series 
from northern Finland, a millennial-scale cooling trend was found since the 
Holocene thermal maximum at around 8 kyr BP associated with the decrease 
in orbitally driven summer insolation. Superimposed on the millennial-scale 
trends, the summer climate in northern Finland was punctuated by several 
quasicyclical climate events, the forcing mechanisms of which are not yet 
fully understood. Our scale space analysis also suggests that inconsistencies 
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in climate reconstructions and their interpretations may be at least partly 
spurious; there is probably no single narrative that counts as the canonical 
version of Holocene climate change. Instead, there are many interpretations 
depending on the proxy and the resolution at which the data are gained and 
examined. Finally, while the paper focuses on paleoclimate time series, the 
proposed method can be applied in other contexts where one seeks to infer 
features that are jointly supported by an ensemble of irregularly sampled 
noisy time series. 

APPENDIX A: ESTIMATION OF THE RECONSTRUCTION ERROR 

We explain here how the temperature anomalies were used to estimate 
upper bounds for the reconstruction error variances. 

Assuming that y^ ~ N(/x fc , a^Ij k ), the distribution of the random vari- 
able V k = ||yfc || 2 = yfy/c is determined by the parameter k = (fi kl a k ). We 
consider a fixed value a k > and the null hypothesis 

#0 : ©0 = {0 k = (H k , <Tk) I Mfc G M m , a k > cf k } 

against the alternative 

H 1 :@ 1 = {6 k = ( Mfc , a k ) | fi k € R m , <r k < a k }. 

The null hypothesis is rejected if V k < v k , where v k is some fixed value. It 
is shown in Holmstrom and Erasto (2001) that the significance level of this 
test is given by 

(17) (3 = P(xl-i<v k /a 2 k ), 

where Xj k —i 1S a chi-square variable with j k — 1 degrees of freedom. Setting 
/3 = 0.05, an upper bound for a k can therefore be estimated as 

o k = \Jv k /x 2 jk - lfim , 

where Xj k -i o 05 ^ s ^ ne percentile of the x 2 -distribution with j k — 1 de- 
grees of freedom. These values are listed in Table 2 for the six proxy records 

Table 2 

Estimates of upper bounds of reconstruction 
errors for the 6 proxy records considered 



Proxy record <Jk 



Lake Toskal chironomids 0.32 

Lake Toskal diatoms 0.27 

Lake Toskal pollen 0.68 

Lake Tsuolbmajavri chironomids 0.59 

Lake Tsuolbmajavri diatoms 0.21 

Lake Tsuolbmajavri pollen 0.71 



CONSENSUS OF PALEOCLIMATE RECONSTRUCTIONS 



27 



and they were used to define the large-error prior scale matrices in the 
consensus analysis. 



APPENDIX B: CONTRIBUTIONS OF INDIVIDUAL PROXY 
RECORDS TO THE CONSENSUS 

It follows from (3) and (8) that 



where it is understood that and are extended to an n x n matrix and 
an n-dimensional vector, respectively, by putting zero entries to locations 
that correspond to those time points in the full joint chronology t that do not 
appear in the chronology of proxy record k. It follows that the components 
of the posterior mean vector E(SoS A T 1 y / t|y,t) can be used to quantify the 
contribution of record k to the posterior of /x at the time points t±, . . . ,r n . If 
S\ and D are the matrices defined in Section 2.3, the contribution of record 
k to the slope of the smooth fi' x at the time points s\, . . . , s r [cf. (16)] can 
then be analyzed by considering the mean of E(DSASoXl^ 1 yfc|y,t), instead. 
This is the quantity depicted for each reconstruction in the bottom panel of 
the second column of Figure 10. 
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SUPPLEMENTARY MATERIAL 

Supplement A: Additional analyses and the data used 

(DOI: 10.1214/12-AOAS540SUPPA; .pdf). The document (a pdf-file) re- 
ports exploratory analyses of the estimated reconstruction errors, shows ad- 
ditional credibility maps, and provides the data analyzed in the article. 



/x|{S fc }, A ,r,y, 



t~N(/x ,£ ) 



where 




and 




Supplement B: The Matlab code (DOI: 10.1214/12-AOAS540SUPPB; 
.zip). The Matlab code (in a zip-file) used to compute the results of the 
article. 
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